Evaluation of the SVM based Multi-Fonts Kanji Character Recognition Method for Early-Modern Japanese Printed Books
نویسندگان
چکیده
The national diet library in Japan provides a web based digital archive for early-modern printed books by image. To make better use of the digital archive, the book images should be converted to text data. In this paper, we evaluate the SVM based multi-fonts Kanji character recognition method for early-modern Japanese printed books. Using several sets of Kanji characters clipped from different publishers’ books, we obtain the recognition rate of more than 92% for 256 kinds of Kanji characters. It proves our recognition method, which uses the PDC (Peripheral Direction Contributivity) feature of given Kanji character images for learning and recognizing with an SVM, is effective for the recognition of multi-fonts Kanji character for earlymodern Japanese printed books.
منابع مشابه
An Effective and Interactive Training Data Collection Method for Early-Modern Japanese Printed Character Recognition
In this paper, we present a web application that supports to collect training data efficiently for early-modern Japanese printed character recognition. The national diet library in Japan provides a lot of early-modern (AD18681945) Japanese printed books to the public, but full-text search is essentially impossible. In order to perform advanced search in historical literatures, it is required ex...
متن کاملA Modfied Self-organizing Map Neural Network to Recognize Multi-font Printed Persian Numerals (RESEARCH NOTE)
This paper proposes a new method to distinguish the printed digits, regardless of font and size, using neural networks.Unlike our proposed method, existing neural network based techniques are only able to recognize the trained fonts. These methods need a large database containing digits in various fonts. New fonts are often introduced to the public, which may not be truly recognized by the Opti...
متن کاملExamining two extreme cases of kanji recognition by Japanese using magnetoencephalography
This paper reports two extreme cases of kanji recognition by Japanese participants using magnetoencephalography (MEG). MEG is a non-invasive technique to measure magnetic field activities generated by neural activities in the brain. MEG was used to investigate neural activities when participants saw visually presented kanji and other objects. In this study we will report two extreme cases; one,...
متن کاملSurvey of Pattern Recognition Approaches in Japanese Character Recognition
Optical Character Recognition (OCR) in Japanese, both handwritten and printed, is difficult to perform, owing to several reasons. Firstly, the Japanese language is comprised of over 3000 characters which can be classified as syllabic characters, or Kana, and ideographic characters, called Kanji. Secondly, Japanese text does not have delimiters like spaces, separating different words. Thirdly, s...
متن کاملFont Descriptor Construction for Printed Thai Character Recognition
The font evolution with various types is a great impact on a recognition performance of optical character recognition (OCR) systems. The more diversity of fonts leads to the less accuracy of recognition rate, particularly Thai-fonts. In order to overcome this obstacle, this paper proposes a font descriptor for printed Thai-character recognition. The role of such a descriptor is a representative...
متن کامل